Integrating Dask, Kerchunk, Zarr and Xarray

imported on: 2024-03-20

This notebook is from a different repository in NASA’s PO.DAAC, the-coding-club.

The original source for this document is https://github.com/podaac/the-coding-club/blob/main/notebooks/SWOT_SSH_dashboard.ipynb

SWOT KaRIn Sea Surface Height (SSH) Cloud Data Loading, Subsetting, and Visualization

PO.DAAC, Jet Propulsion Laboratory, California Institution of Technology
Author: Ayush Nag

Background

  • This notebook allows you to visualize large amounts of simulated SWOT data in an interactive dashboard
    • Connects to PO.DAAC simulated SWOT collection
    • Reads data as zarr cloud-optimized data store
    • 173 GB in milliseconds
  • To run this notebook, the SWOT_SIMULATED_L2_KARIN_SSH_GLORYS_SCIENCE_V1_kerchunk_DEMO.json file made from the SWOT_to_kerchunk.ipynb notebook is needed.

User Workflow

  1. Connect to SWOT collection (currently ~17k granules)
    • Very fast using zarr/kerchunk (500ms-1s)
  2. Subset temporally or spatially
    • Ex: lat/lon, time, or mean.
    • Can be done easily with xarray
    • Fast operations since xarray can slice with coordinates
  3. Visualize
    • Analyze the data you selected
    • Interactive dashboard allows for panning around and zooming in to view full resolution swaths
    • Uses the familiar xarray interface; User can modify as needed and revisualize
    • Uses little system memory because of xarray + zarr + dask + datashader optimizations
  4. Save
    • S3 bucket transfer
      • Fastest method of data transfer
      • Connect to your own workspace/code
    • PODAAC data subscriber (https://github.com/podaac/data-subscriber)
      • Explore data in this notebook; Then use podaac-data-subscriber CLI interface to filter and download netCDF’s to local machine
import s3fs
import glob
import dask
import ujson
import hvplot
import cartopy
import requests
import matplotlib
import hvplot.dask
import hvplot.xarray

import os
import numpy as np
import panel as pn
import xarray as xr
import holoviews as hv
import cartopy.crs as ccrs
import panel.widgets as pnw
import matplotlib.pyplot as plt

from tqdm.notebook import tqdm
from dask.distributed import Client

hv.extension('bokeh')
pn.param.ParamMethod.loading_indicator = True

Start Dask cluster

  • Data can be read from s3 in parallel and used by hvplot
dask.config.set(**{'array.slicing.split_large_chunks': False})
client = Client()
client
2023-08-24 04:37:30,076 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-yeean92m', purging
2023-08-24 04:37:30,077 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-mhuxlvme', purging
2023-08-24 04:37:30,077 - distributed.diskutils - INFO - Found stale lock file and directory '/tmp/dask-worker-space/worker-cru4iqsj', purging

Client

Client-eeba1af6-4237-11ee-a35c-1ab0adcc49e5

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

Worker: 1

Comm: tcp://127.0.0.1:34899 Total threads: 1
Dashboard: http://127.0.0.1:34749/status Memory: 1.00 GiB
Nanny: tcp://127.0.0.1:44689
Local directory: /tmp/dask-worker-space/worker-xg3nws55

Worker: 2

Comm: tcp://127.0.0.1:33301 Total threads: 1
Dashboard: http://127.0.0.1:46561/status Memory: 1.00 GiB
Nanny: tcp://127.0.0.1:46859
Local directory: /tmp/dask-worker-space/worker-itfe0jiy

Worker: 3

Comm: tcp://127.0.0.1:36665 Total threads: 1
Dashboard: http://127.0.0.1:42357/status Memory: 1.00 GiB
Nanny: tcp://127.0.0.1:42787
Local directory: /tmp/dask-worker-space/worker-t56otwss

Get PO.DAAC s3 credentials (authenticate via NASA Earthdata login)

%%time
url = "https://archive.podaac.earthdata.nasa.gov/s3credentials"
creds = requests.get(url).json()
# TODO: print success message if HTTPS succeeded
CPU times: user 170 ms, sys: 12.8 ms, total: 183 ms
Wall time: 3.26 s

Open collection using Kerchunk/Zarr

%%time
data = xr.open_dataset(
    "reference://", engine="zarr", chunks={},
    backend_kwargs={
        "storage_options": 
        {"fo": "SWOT_SIMULATED_L2_KARIN_SSH_GLORYS_SCIENCE_V1_kerchunk_DEMO.json",
         "remote_protocol": "s3",
         "remote_options": {"anon": False, "key": creds['accessKeyId'], "secret": creds['secretAccessKey'], "token": creds["sessionToken"]}},
         "consolidated": False
    }
)
data.simulated_true_ssh_karin
CPU times: user 900 ms, sys: 94 ms, total: 994 ms
Wall time: 1.36 s
<xarray.DataArray 'simulated_true_ssh_karin' (cycle_num: 3, pass_num: 456,
                                              num_lines: 9866, num_pixels: 71)>
dask.array<open_dataset-70a24b787b07d7586a2871a6b3264085simulated_true_ssh_karin, shape=(3, 456, 9866, 71), dtype=float64, chunksize=(1, 1, 9866, 71), chunktype=numpy.ndarray>
Coordinates:
  * cycle_num        (cycle_num) float64 3.0 4.0 5.0
    latitude         (cycle_num, pass_num, num_lines, num_pixels) float64 dask.array<chunksize=(1, 1, 9866, 71), meta=np.ndarray>
    latitude_nadir   (cycle_num, pass_num, num_lines) float64 dask.array<chunksize=(1, 1, 9866), meta=np.ndarray>
    longitude        (cycle_num, pass_num, num_lines, num_pixels) float64 dask.array<chunksize=(1, 1, 9866, 71), meta=np.ndarray>
    longitude_nadir  (cycle_num, pass_num, num_lines) float64 dask.array<chunksize=(1, 1, 9866), meta=np.ndarray>
  * pass_num         (pass_num) float64 1.0 2.0 3.0 4.0 ... 579.0 580.0 583.0
Dimensions without coordinates: num_lines, num_pixels
Attributes:
    comment:        Height of the sea surface free of measurement errors.
    long_name:      sea surface height
    standard_name:  sea surface height above reference ellipsoid
    units:          m
    valid_max:      150000000
    valid_min:      -15000000
  • pass_num
    (pass_num)
    float64
    1.0 2.0 3.0 ... 579.0 580.0 583.0
    array([  1.,   2.,   3., ..., 579., 580., 583.])
  • comment :
    Height of the sea surface free of measurement errors.
    long_name :
    sea surface height
    standard_name :
    sea surface height above reference ellipsoid
    units :
    m
    valid_max :
    150000000
    valid_min :
    -15000000
  • # Note: this represents the amount of data needed in memory to load the entire dataset
    # The actual files are smaller on disk (s3) because of netCDF compression
    print(f'{data.nbytes / 1e9} GB')
    385.573386456 GB

    Visualize with interactive dashboard

    def dashboard(data: xr.Dataset, x="longitude", y="latitude", groupby=["cycle_num", "pass_num"]):
        """
        Creates dashboard that can visualize SWOT collection
        """
        # --- WIDGETS --- #
        # only keep the vars that are in swath format
        plotting_vars = [var for var in data.data_vars if set(data[var].dims) == {'cycle_num', 'pass_num', 'num_lines', 'num_pixels'}]
        var_select = pnw.Select(value='simulated_true_ssh_karin', options=plotting_vars, name="variable")
        # var_select = pnw.Select(value='simulated_true_ssh_karin', options=list(data.variables), name="variable")
        
        cmap = pnw.Select(value='jet', options=['jet', 'RdBu', 'viridis', 'plasma', 'inferno', 'magma', 'cividis'], name="colormap")
        agg_fn = pnw.Select(value='mean', options=['count','sum','min','max','mean','var','std'], name="aggregration function")
        normalization = pnw.Select(value='linear', options=['linear', 'eq_hist'], name="normalization")
        projection = pnw.Select(value='PlateCarree', options=['PlateCarree', 'SouthPolarStereo', 'NorthPolarStereo'], name="projection")
        # fixes certain projections to a zoomed in starting point (otherwise plot is zoomed out to show the whole swath)
        projection_xlim = {'PlateCarree': None, 'SouthPolarStereo': (-180, 180), 'NorthPolarStereo': (-180, 180)}
        projection_ylim = {'PlateCarree': None, 'SouthPolarStereo': (-30, -90), 'NorthPolarStereo': (30, 90)}
        coastlines=pnw.Checkbox(name="coastlines", value=True, width=65)
        colorbar=pnw.Checkbox(name="colorbar", value=True, width=60)
        grid=pnw.Checkbox(name="grid", value=False, width=60)
        # z_slider = pnw.IntSlider(name="z", start=0, end=data.z.size - 1, step=1, value=0, width=500)
    
        # def time_boxes(z):
            # return pn.Column(pnw.StaticText(name='start', value=str(data.time[z].values[0])), pnw.StaticText(name='end', value=str(data.time[z].values[data.num_lines.size - 1])))
    
        # --- INTERACTIVE PLOT --- #
        def data_plot(var, proj):
            return data[var].hvplot.points(
                    x=x, xlabel=x,
                    y=y, ylabel=y,
                    groupby=groupby,
                    # widgets={"z": z_slider},
                    widget_location="bottom",
                    aggregator=agg_fn, cnorm=normalization,
                    cmap=cmap, colorbar=colorbar,
                    coastline=coastlines, grid=grid,
                    geo=True, projection=proj,
                    # tools=['tap', 'wheel_zoom'],
                    width=650, height=450,
                    rasterize=True, project=True, dynspread=True,
                    xlim=projection_xlim[proj], ylim=projection_ylim[proj],
                    title=f"PODAAC SWOT L2 Dashboard ({str(var)})"
                )
    
        # bind functions to widgets (like update handlers)
        idata_plot = pn.bind(data_plot, var=var_select, proj=projection)
        # itime_boxes = pn.bind(time_boxes, z=z_slider)
    
        # --- DEFINE LAYOUT --- #
        # return pn.Row(pn.Column(idata_plot), pn.Column(var_select, agg_fn, normalization, cmap, projection, itime_boxes, pn.Row(coastlines, colorbar, grid)))
        return pn.Row(pn.Column(idata_plot), pn.Column(var_select, agg_fn, normalization, cmap, projection, pn.Row(coastlines, colorbar, grid)))
    import warnings
    warnings.filterwarnings('ignore')
    %time dashboard(data, groupby=['cycle_num', 'pass_num'])
    WARNING:param.main: Calling the .opts method with options broken down by options group (i.e. separate plot, style and norm groups) is deprecated. Use the .options method converting to the simplified format instead or use hv.opts.apply_groups for backward compatibility.
    CPU times: user 3.28 s, sys: 224 ms, total: 3.5 s
    Wall time: 3.38 s

    Note: if you get the error DataError: XArrayInterface expects gridded data. This means you need to refresh your PODAAC s3 credentials and reopen the dataset

    Example: Plot region within swath

    %%time
    small_ds = data.sel(cycle_num=4, pass_num=2)
    # mask_lon = (small_ds.longitude >= -15) & (small_ds.longitude <= -10)
    mask_lat = (small_ds.latitude >= -45) & (small_ds.latitude <= -40)
    
    cropped_ds = small_ds.where(mask_lat.compute(), drop=True)
    cropped_ds
    CPU times: user 262 ms, sys: 6.67 ms, total: 268 ms
    Wall time: 523 ms
    <xarray.Dataset>
    Dimensions:                                (num_lines: 304, num_pixels: 71,
                                                num_sides: 2)
    Coordinates:
        cycle_num                              float64 4.0
        latitude                               (num_lines, num_pixels) float64 dask.array<chunksize=(304, 71), meta=np.ndarray>
        latitude_nadir                         (num_lines) float64 dask.array<chunksize=(304,), meta=np.ndarray>
        longitude                              (num_lines, num_pixels) float64 dask.array<chunksize=(304, 71), meta=np.ndarray>
        longitude_nadir                        (num_lines) float64 dask.array<chunksize=(304,), meta=np.ndarray>
        pass_num                               float64 2.0
    Dimensions without coordinates: num_lines, num_pixels, num_sides
    Data variables: (12/91)
        ancillary_surface_classification_flag  (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
        correction_flag                        (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
        cross_track_angle                      (num_lines, num_pixels) float64 dask.array<chunksize=(304, 71), meta=np.ndarray>
        cross_track_distance                   (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
        dac                                    (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
        depth_or_elevation                     (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
        ...                                     ...
        wind_speed_karin                       (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
        wind_speed_karin_2                     (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
        wind_speed_model_u                     (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
        wind_speed_model_v                     (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
        wind_speed_rad                         (num_lines, num_sides, num_pixels) float32 dask.array<chunksize=(304, 2, 71), meta=np.ndarray>
        x_factor                               (num_lines, num_pixels) float32 dask.array<chunksize=(304, 71), meta=np.ndarray>
    Attributes: (12/32)
        Conventions:                CF-1.7
        contact:                    CNES aviso@altimetry.fr, JPL podaac@podaac.jp...
        cycle_number:               5
        ellipsoid_flattening:       0.003352810664781205
        ellipsoid_semi_major_axis:  6378137.0
        equator_longitude:          206.06188772087626
        ...                         ...
        right_last_longitude:       289.6585533138908
        source:                     Simulate product
        time_coverage_end:          2014-07-24T10:18:18.533147Z
        time_coverage_start:        2014-07-24T09:26:52.109265Z
        title:                      Level 2 Low Rate Sea Surface Height Data Prod...
        wavelength:                 0.008385803020979
  • Conventions :
    CF-1.7
    contact :
    CNES aviso@altimetry.fr, JPL podaac@podaac.jpl.nasa.gov
    cycle_number :
    5
    ellipsoid_flattening :
    0.003352810664781205
    ellipsoid_semi_major_axis :
    6378137.0
    equator_longitude :
    206.06188772087626
    equator_time :
    2014-07-24T09:52:36.962185Z
    geospatial_lat_max :
    78.29189230598696
    geospatial_lat_min :
    -78.29203183241484
    geospatial_lon_max :
    289.6585533138908
    geospatial_lon_min :
    122.7028213028531
    history :
    2021-09-10 10:00:06Z : Creation
    institution :
    CNES/JPL
    left_first_latitude :
    -77.032982418125
    left_first_longitude :
    122.7028213028531
    left_last_latitude :
    78.29189230598696
    left_last_longitude :
    289.65746162390826
    orbit_solution :
    POE
    pass_number :
    545
    platform :
    SWOT
    product_version :
    1.1.0.dev33
    reference_document :
    D-56407_SWOT_Product_Description_L2_LR_SSH
    references :
    Gaultier, L., C. Ubelmann, and L.-L. Fu, 2016: The Challenge of Using Future SWOT Data for Oceanic Field Reconstruction. J. Atmos. Oceanic Technol., 33, 119-126, doi:10.1175/jtech-d-15-0160.1. http://dx.doi.org/10.1175/JTECH-D-15-0160.1.
    right_first_latitude :
    -78.29203183241484
    right_first_longitude :
    122.70935482261133
    right_last_latitude :
    77.03284214129418
    right_last_longitude :
    289.6585533138908
    source :
    Simulate product
    time_coverage_end :
    2014-07-24T10:18:18.533147Z
    time_coverage_start :
    2014-07-24T09:26:52.109265Z
    title :
    Level 2 Low Rate Sea Surface Height Data Product - Expert SSH with Wind and Wave
    wavelength :
    0.008385803020979
  • %%time
    ax = plt.axes(projection=ccrs.PlateCarree())
    plt.figure(figsize=(21, 15))
    cropped_ds.simulated_true_ssh_karin.plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree(), x="longitude", y="latitude", cmap='jet', add_colorbar=True)
    CPU times: user 146 ms, sys: 14.6 ms, total: 161 ms
    Wall time: 638 ms

    <Figure size 1512x1080 with 0 Axes>

    Example: Plot 21-day cycle subset on the Southern Ocean

    The coordinate latitude_nadir has dimensions (z, num_lines) since it only measures one pixel per line. Subsetting over this latitude is much faster and more memory efficient since there is less data to look over then KaRIn’s latitude’s (z, num_lines, num_pixels)

    %%time
    one_cycle = data.sel(cycle_num=3)
    data_so = one_cycle.where((one_cycle.latitude_nadir <= -30).compute(), drop=True)
    data_so
    CPU times: user 3.8 s, sys: 157 ms, total: 3.95 s
    Wall time: 20 s
    <xarray.Dataset>
    Dimensions:                                (pass_num: 456, num_lines: 6491,
                                                num_pixels: 71, num_sides: 2)
    Coordinates:
        cycle_num                              float64 3.0
        latitude                               (pass_num, num_lines, num_pixels) float64 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        latitude_nadir                         (pass_num, num_lines) float64 dask.array<chunksize=(1, 6491), meta=np.ndarray>
        longitude                              (pass_num, num_lines, num_pixels) float64 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        longitude_nadir                        (pass_num, num_lines) float64 dask.array<chunksize=(1, 6491), meta=np.ndarray>
      * pass_num                               (pass_num) float64 1.0 2.0 ... 583.0
    Dimensions without coordinates: num_lines, num_pixels, num_sides
    Data variables: (12/91)
        ancillary_surface_classification_flag  (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        correction_flag                        (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        cross_track_angle                      (pass_num, num_lines) float64 dask.array<chunksize=(1, 6491), meta=np.ndarray>
        cross_track_distance                   (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        dac                                    (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        depth_or_elevation                     (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        ...                                     ...
        wind_speed_karin                       (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        wind_speed_karin_2                     (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        wind_speed_model_u                     (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        wind_speed_model_v                     (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
        wind_speed_rad                         (pass_num, num_lines, num_sides) float32 dask.array<chunksize=(1, 6491, 2), meta=np.ndarray>
        x_factor                               (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 6491, 71), meta=np.ndarray>
    Attributes: (12/32)
        Conventions:                CF-1.7
        contact:                    CNES aviso@altimetry.fr, JPL podaac@podaac.jp...
        cycle_number:               5
        ellipsoid_flattening:       0.003352810664781205
        ellipsoid_semi_major_axis:  6378137.0
        equator_longitude:          206.06188772087626
        ...                         ...
        right_last_longitude:       289.6585533138908
        source:                     Simulate product
        time_coverage_end:          2014-07-24T10:18:18.533147Z
        time_coverage_start:        2014-07-24T09:26:52.109265Z
        title:                      Level 2 Low Rate Sea Surface Height Data Prod...
        wavelength:                 0.008385803020979
  • Conventions :
    CF-1.7
    contact :
    CNES aviso@altimetry.fr, JPL podaac@podaac.jpl.nasa.gov
    cycle_number :
    5
    ellipsoid_flattening :
    0.003352810664781205
    ellipsoid_semi_major_axis :
    6378137.0
    equator_longitude :
    206.06188772087626
    equator_time :
    2014-07-24T09:52:36.962185Z
    geospatial_lat_max :
    78.29189230598696
    geospatial_lat_min :
    -78.29203183241484
    geospatial_lon_max :
    289.6585533138908
    geospatial_lon_min :
    122.7028213028531
    history :
    2021-09-10 10:00:06Z : Creation
    institution :
    CNES/JPL
    left_first_latitude :
    -77.032982418125
    left_first_longitude :
    122.7028213028531
    left_last_latitude :
    78.29189230598696
    left_last_longitude :
    289.65746162390826
    orbit_solution :
    POE
    pass_number :
    545
    platform :
    SWOT
    product_version :
    1.1.0.dev33
    reference_document :
    D-56407_SWOT_Product_Description_L2_LR_SSH
    references :
    Gaultier, L., C. Ubelmann, and L.-L. Fu, 2016: The Challenge of Using Future SWOT Data for Oceanic Field Reconstruction. J. Atmos. Oceanic Technol., 33, 119-126, doi:10.1175/jtech-d-15-0160.1. http://dx.doi.org/10.1175/JTECH-D-15-0160.1.
    right_first_latitude :
    -78.29203183241484
    right_first_longitude :
    122.70935482261133
    right_last_latitude :
    77.03284214129418
    right_last_longitude :
    289.6585533138908
    source :
    Simulate product
    time_coverage_end :
    2014-07-24T10:18:18.533147Z
    time_coverage_start :
    2014-07-24T09:26:52.109265Z
    title :
    Level 2 Low Rate Sea Surface Height Data Product - Expert SSH with Wind and Wave
    wavelength :
    0.008385803020979
  • Using .coarsen()

    %%time
    data_so = data_so.coarsen({'num_pixels': 3, 'num_lines': 4}, boundary='trim').mean()
    data_so
    CPU times: user 1.22 s, sys: 53.4 ms, total: 1.28 s
    Wall time: 1.42 s
    <xarray.Dataset>
    Dimensions:                                (pass_num: 456, num_lines: 1622,
                                                num_pixels: 23, num_sides: 2)
    Coordinates:
        cycle_num                              float64 3.0
        latitude                               (pass_num, num_lines, num_pixels) float64 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        latitude_nadir                         (pass_num, num_lines) float64 dask.array<chunksize=(1, 1622), meta=np.ndarray>
        longitude                              (pass_num, num_lines, num_pixels) float64 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        longitude_nadir                        (pass_num, num_lines) float64 dask.array<chunksize=(1, 1622), meta=np.ndarray>
      * pass_num                               (pass_num) float64 1.0 2.0 ... 583.0
    Dimensions without coordinates: num_lines, num_pixels, num_sides
    Data variables: (12/91)
        ancillary_surface_classification_flag  (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        correction_flag                        (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        cross_track_angle                      (pass_num, num_lines) float64 dask.array<chunksize=(1, 1622), meta=np.ndarray>
        cross_track_distance                   (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        dac                                    (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        depth_or_elevation                     (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        ...                                     ...
        wind_speed_karin                       (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        wind_speed_karin_2                     (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        wind_speed_model_u                     (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        wind_speed_model_v                     (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
        wind_speed_rad                         (pass_num, num_lines, num_sides) float32 dask.array<chunksize=(1, 1622, 2), meta=np.ndarray>
        x_factor                               (pass_num, num_lines, num_pixels) float32 dask.array<chunksize=(1, 1622, 23), meta=np.ndarray>
    Attributes: (12/32)
        Conventions:                CF-1.7
        contact:                    CNES aviso@altimetry.fr, JPL podaac@podaac.jp...
        cycle_number:               5
        ellipsoid_flattening:       0.003352810664781205
        ellipsoid_semi_major_axis:  6378137.0
        equator_longitude:          206.06188772087626
        ...                         ...
        right_last_longitude:       289.6585533138908
        source:                     Simulate product
        time_coverage_end:          2014-07-24T10:18:18.533147Z
        time_coverage_start:        2014-07-24T09:26:52.109265Z
        title:                      Level 2 Low Rate Sea Surface Height Data Prod...
        wavelength:                 0.008385803020979
  • Conventions :
    CF-1.7
    contact :
    CNES aviso@altimetry.fr, JPL podaac@podaac.jpl.nasa.gov
    cycle_number :
    5
    ellipsoid_flattening :
    0.003352810664781205
    ellipsoid_semi_major_axis :
    6378137.0
    equator_longitude :
    206.06188772087626
    equator_time :
    2014-07-24T09:52:36.962185Z
    geospatial_lat_max :
    78.29189230598696
    geospatial_lat_min :
    -78.29203183241484
    geospatial_lon_max :
    289.6585533138908
    geospatial_lon_min :
    122.7028213028531
    history :
    2021-09-10 10:00:06Z : Creation
    institution :
    CNES/JPL
    left_first_latitude :
    -77.032982418125
    left_first_longitude :
    122.7028213028531
    left_last_latitude :
    78.29189230598696
    left_last_longitude :
    289.65746162390826
    orbit_solution :
    POE
    pass_number :
    545
    platform :
    SWOT
    product_version :
    1.1.0.dev33
    reference_document :
    D-56407_SWOT_Product_Description_L2_LR_SSH
    references :
    Gaultier, L., C. Ubelmann, and L.-L. Fu, 2016: The Challenge of Using Future SWOT Data for Oceanic Field Reconstruction. J. Atmos. Oceanic Technol., 33, 119-126, doi:10.1175/jtech-d-15-0160.1. http://dx.doi.org/10.1175/JTECH-D-15-0160.1.
    right_first_latitude :
    -78.29203183241484
    right_first_longitude :
    122.70935482261133
    right_last_latitude :
    77.03284214129418
    right_last_longitude :
    289.6585533138908
    source :
    Simulate product
    time_coverage_end :
    2014-07-24T10:18:18.533147Z
    time_coverage_start :
    2014-07-24T09:26:52.109265Z
    title :
    Level 2 Low Rate Sea Surface Height Data Product - Expert SSH with Wind and Wave
    wavelength :
    0.008385803020979
  • Example: Plotting granules

    map_proj = ccrs.PlateCarree()
    fig = plt.figure(figsize=[18, 12])  # inches
    # fig.suptitle('SWOT L2 Sea Surface Height (m)', fontsize=16)
    ax = plt.subplot(projection=map_proj)
    ax.set_facecolor("gray")
    # ax.set_extent([-180, 180, -90, -39.4], ccrs.PlateCarree())
    # fig.subplots_adjust(bottom=0.05, top=0.95, left=0.04, right=0.95, wspace=0.02)
    ax.add_feature(cartopy.feature.LAND, zorder=10, facecolor='w')
    ax.add_feature(cartopy.feature.COASTLINE, zorder=10)
    # ax.gridlines(xlocs=[-130, -60, 20, 90, 160], draw_labels=True)
    
    for pass_num in tqdm(data.pass_num.values[::2]):
        data.isel(cycle_num=0).sel(pass_num=pass_num).simulated_true_ssh_karin.plot.pcolormesh(x='longitude', y='latitude', ax=ax, cmap='jet', transform=ccrs.PlateCarree(), add_colorbar=False)

    Export to S3 bucket or local download

    data.to_netcdf("SWOT_spatial_subset.nc", mode='w')
    data.to_zarr("SWOT_spatial_subset.zarr", chunks={}, mode='w', consolidated=True)
    s3_sandbox = s3fs.S3FileSystem(key="", secret="")
    store = s3fs.S3Map(root="s3://.../swot_21_day.zarr", s3=s3_sandbox)
    data.to_zarr(store, chunks={}, mode='w', consolidated=True)

    Appendix/Extras

    Open with test Zarr

    f = open("s3-creds.json", "r")
    creds = ujson.load(f)
    f.close()
    s3_sandbox = s3fs.S3FileSystem(key=creds["AWS_ACCESS_KEY_ID"], secret=creds["AWS_SECRET_ACCESS_KEY"])
    store = s3fs.S3Map(root="s3://.../swot_zarr_21_day_mod.zarr", s3=s3_sandbox)
    %time data = xr.open_zarr(store, chunks={}, consolidated=True)
    data
    CPU times: user 275 ms, sys: 28.1 ms, total: 303 ms
    Wall time: 739 ms
    <xarray.Dataset>
    Dimensions:                                (z: 616, num_lines: 9864,
                                                num_pixels: 71, num_sides: 2)
    Coordinates:
        latitude                               (z, num_lines, num_pixels) float64 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        latitude_nadir                         (z, num_lines) float64 dask.array<chunksize=(1, 9864), meta=np.ndarray>
        longitude                              (z, num_lines, num_pixels) float64 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        longitude_nadir                        (z, num_lines) float64 dask.array<chunksize=(1, 9864), meta=np.ndarray>
    Dimensions without coordinates: z, num_lines, num_pixels, num_sides
    Data variables: (12/93)
        ancillary_surface_classification_flag  (z, num_lines, num_pixels) float32 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        correction_flag                        (z, num_lines, num_pixels) float32 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        cross_track_angle                      (z, num_lines) float64 dask.array<chunksize=(1, 9864), meta=np.ndarray>
        cross_track_distance                   (z, num_lines, num_pixels) float32 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        cycle_number                           (z) int16 dask.array<chunksize=(616,), meta=np.ndarray>
        dac                                    (z, num_lines, num_pixels) float32 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        ...                                     ...
        wind_speed_karin                       (z, num_lines, num_pixels) float32 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        wind_speed_karin_2                     (z, num_lines, num_pixels) float32 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        wind_speed_model_u                     (z, num_lines, num_pixels) float32 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        wind_speed_model_v                     (z, num_lines, num_pixels) float32 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
        wind_speed_rad                         (z, num_lines, num_sides) float32 dask.array<chunksize=(1, 9864, 2), meta=np.ndarray>
        x_factor                               (z, num_lines, num_pixels) float32 dask.array<chunksize=(1, 9864, 71), meta=np.ndarray>
    Attributes: (12/32)
        Conventions:                CF-1.7
        contact:                    CNES aviso@altimetry.fr, JPL podaac@podaac.jp...
        cycle_number:               4
        ellipsoid_flattening:       0.003352810664781205
        ellipsoid_semi_major_axis:  6378137.0
        equator_longitude:          225.18862751249918
        ...                         ...
        right_last_longitude:       308.65044766455395
        source:                     Simulate product
        time_coverage_end:          2014-07-01T00:40:37.846974Z
        time_coverage_start:        2014-06-30T23:49:11.421067Z
        title:                      Level 2 Low Rate Sea Surface Height Data Prod...
        wavelength:                 0.008385803020979
  • Conventions :
    CF-1.7
    contact :
    CNES aviso@altimetry.fr, JPL podaac@podaac.jpl.nasa.gov
    cycle_number :
    4
    ellipsoid_flattening :
    0.003352810664781205
    ellipsoid_semi_major_axis :
    6378137.0
    equator_longitude :
    225.18862751249918
    equator_time :
    2014-07-01T00:14:52.594023Z
    geospatial_lat_max :
    78.29160065866016
    geospatial_lat_min :
    -78.29158160448438
    geospatial_lon_max :
    308.65175879384276
    geospatial_lon_min :
    141.69602279630138
    history :
    2021-09-10 09:59:15Z : Creation
    institution :
    CNES/JPL
    left_first_latitude :
    78.29160065866016
    left_first_longitude :
    141.7023394464077
    left_last_latitude :
    -77.03253144932863
    left_last_longitude :
    308.65175879384276
    orbit_solution :
    POE
    pass_number :
    474
    platform :
    SWOT
    product_version :
    1.1.0.dev33
    reference_document :
    D-56407_SWOT_Product_Description_L2_LR_SSH
    references :
    Gaultier, L., C. Ubelmann, and L.-L. Fu, 2016: The Challenge of Using Future SWOT Data for Oceanic Field Reconstruction. J. Atmos. Oceanic Technol., 33, 119-126, doi:10.1175/jtech-d-15-0160.1. http://dx.doi.org/10.1175/JTECH-D-15-0160.1.
    right_first_latitude :
    77.03255119402299
    right_first_longitude :
    141.69602279630138
    right_last_latitude :
    -78.29158160448438
    right_last_longitude :
    308.65044766455395
    source :
    Simulate product
    time_coverage_end :
    2014-07-01T00:40:37.846974Z
    time_coverage_start :
    2014-06-30T23:49:11.421067Z
    title :
    Level 2 Low Rate Sea Surface Height Data Product - Expert SSH with Wind and Wave
    wavelength :
    0.008385803020979
  • Create shapefile for swath

    %%time
    g = 0
    lon = np.concatenate([data['longitude'][g, :, 0], data['longitude'][g, :, 70][::-1]])
    lat = np.concatenate([data['latitude'][g, :, 0], data['latitude'][g, :, 70][::-1]])
    Polygon(zip(lon, lat))
    CPU times: user 140 ms, sys: 11.6 ms, total: 151 ms
    Wall time: 643 ms

    Edge case (to be fixed) - Needs to handle case where longitude jumps from 0 to 360. The swath is not numerically continous (longitude). Probably create MultiPolygon at split point - Split point is when values go from (+) -> (-) or (-) to (+) split_points = np.where(lon_left[1:] <= lon_left[:-1])[0]

    %%time
    from shapely.geometry import Polygon
    g = 3
    lon = np.concatenate([data['longitude'][g, :, 0], data['longitude'][g, :, 70][::-1]])
    lat = np.concatenate([data['latitude'][g, :, 0], data['latitude'][g, :, 70][::-1]])
    Polygon(zip(lon, lat))
    CPU times: user 146 ms, sys: 5.43 ms, total: 152 ms
    Wall time: 638 ms

    Interactive plot of swath shapefiles

    %%time
    z = 0
    granules = 5
    swaths, passes = [], []
    for i in tqdm(range(granules)):
        lon = np.concatenate([data['longitude'][i, :, 0], data['longitude'][i, :, 70][::-1]])
        lat = np.concatenate([data['latitude'][i, :, 0], data['latitude'][i, :, 70][::-1]])
        swaths.append(Polygon(zip(lon, lat)))
        passes.append(i)
    data_df = pd.DataFrame({"indexer": np.arange(granules), "pass": passes})
    swaths_gdf = gpd.GeoDataFrame(data_df, geometry=swaths)
    swaths_gdf.hvplot.polygons(
        coastline=True,
        geo=True,
        alpha=0.3,
        project=True,
        width=650,
        height=450,
    )